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DMA SEOUF>?C!F! ENCODING ENZY MES OF CIAVUIANIC ACID 
WTOSYNTHESIS 



This invention relates to methods for the production 
of the antibiotic, clavulanic acid. 

Background of the Invention 

Clavulanic acid is a broad spectrum beta- lactamase 
inhibitor and is an important antibiotic for the 
treatment of infectious diseases. It is produced 
commercially by the gram-positive mycelial prokaryote 
gt:ye ptoTnvces clavuliaerus . which also produces the 
lactam antibiotics penicillin N, desacetoxy 
cephalosphorin C and cephamycin C, Until recently, 
) however, the pathway employed for clavulanic acid 
biosynthesis was much less well understood than the 
pathways leading to these other antibiotics. 

Without knowledge of the pathway for clavulanic acid 
biosynthesis, it was not possible to isolate the genes 
0 coding for the key enzymes and to manipulate these genes 
to increase antibiotic yield or permit production of the 
antibiotic in heterologous systems. 

One of the earliest enzymes of the pathway to be 
purified and characterised was clavaminic acid synthase. 
5 Two isozymes have now been identified and characterised 
(Marsh et al., (1992), Biochem. , vol. 31, pp. 12648-657). 

European Patent Application 0349121 describes a DNA 
restriction fragment encoding a portion of the genetic 
information involved in clavulanic acid synthesis but 
30 provides no sequence information • 

Until the work of the present inventors, the 
complete complement of genes required for clavulanic acid 
synthesis had not been identified. The present inventors 
have now isolated, cloned and sequenced an 11.6 kb 
35 genomic DNA sequence from s. clavuliaerus which codes for 
eight proteins and enables the production of clavulanic 



2108113 



Figure 7 shows an alignment of the amino acid 
sequence of CLA (R- clavuliaerus CIA) with those of 
Coll agmatine ureohydrolase (E, Coli. AUH) , yeast 
arginase (yeast ARG) , rat arginase (rat ARG) and human 
5 arginase (human ARG) . 

Figure 8 shows a Southern blot of Ncol digests of 
genomic DNA from five presumptive mutants (lanes 1-5) and 
from wild-type clavuliaerus (lane 6) • Panel A : 
membranes probed with cla-specific probe. Panel B : 
10 membranes probed with tsr-specific probe. 

Figure 9 shows restriction enzyme maps of Sj^. 
niavuliaerus DNA inserts in cosmids. A- Restriction 
enzyme map of cosmid K6L2. B. Partial restriction 
enzyme map of cosmid K8L2. C. Restriction map of 
15 cosmids K6L2 and K8L2 indicating location of pcbC gene in 
relation to cla. D, The 2.0 kb Ncol fragment 
encompassing the cla gene used in generating nested 
deletions for sequencing. Abbreviations: Ba, Baiitfll; 
B^salll; E,EcoRl; K.Kpnl; Hcol; S,SalI; and Sm,SmaI.. 
20 Figure 10 shows the deduced amino acid sequence 

(Sequence ID No.:3) of ORFl of Figure 2. 

Figure 11 shows the deduced amino acid sequence 
(Sequence ID No.: 4) of ORF2 of Figure 2. 

Figure 12 shows the deduced amino acid sequence 
25 (Sequence ID No.:5) of 0RF3 of Figure 2. 

Figure 13 shows the deduced amino acid sequence 
(Sequence ID No.: 6) of ORF4 of Figure 2. 

Figure 14 shows the deduced amino acid sequence 
(Sequence ID No.: 7) of ORF5 of Figure 2. 
30 Figure 15 shows the deduced amino acid sequence 

(Sequence ID No.: 8) of ORF6 of Figure 2. 

Figure 16 shows the deduced amino acid sequence 
(Sequence ID No.: 9) of ORF7 of Figure 2. 

Figure 17 shows the deduced amino acid sequence 
35 (Sequence ID No.: 10) of ORF8 of Figure 2. 

Figure 18 shows the deduced amino acid sequence 
(Sequence ID No.: 11) of ORF9 of Figure 2. 
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when introduced into the non-clavulanate producer S^. 
lividans as described in Example 4, enabled that species 
to produce clavulanic acid. This indicates that the 11.6 
kb fragment contains all the genetic information required 
for clavulanate production. 

As will be understood by those skilled in the art, 
the identification of the DNA sequence encoding the 
enzymes required for clavulanate synthesis will permit 
genetic manipulations to modify or enhance clavulanate 
production. For example, clavulanate production by S^^. 
clavuliaerus may be modified by introduction of extra 
copies of the gene or genes for rate limiting enzymes or 
by alteration of the regulatory components controlling 
expression of the genes for the clavulanate pathway. 

Heterologous organisms which do not normally 
produce clavulanate may also be enabled to produce 
clavulanate by introduction, for example, of the 11.6 kb 
DNA sequence of the invention by techniques which are 
well known in the art, as exemplified herein by the 
production of s. lividans strains capable of clavulanate 
synthesis. Such heterologous production of clavulanic 
acid provides a means of producing clavulanic acid free 
of other contaminating clavams which are produced by 
clavuliaerus . 

Suitable vectors and hosts will be known to those 
skilled in the art; suitable vectors include pIJ702, 
pJ0£829 and pIJ922 and suitable hosts include 
lividans . S. oarvulus , S. ariseo fulvus. S. antibioticus 
and S, lipmanii . 

Additionally, the DNA sequences of the invention 
enable the production of one or more of the enzymes of 
the clavulanate pathway by expression of the relevant 
gene or genes in a heterologous expression system. 

The DNA sequences coding for one or more of the 
pathway enzymes may be introduced into suitable vectors 
and hosts by conventional techniques known to those 
skilled in the art. Suitable vectors include pUC118/119 
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shown in Figure 3. ORF 4 corresponds to cl^. ORF 1,7 fr 
8 are oriented in the opposite direction to EsfeC. ORFs 
2-6 and ORF 10 are all oriented in the same direction as 
pcbC. ORFs 2 and 3, and ORFs 4 and 5 are separated by 
5 very short intergenic regions suggesting the possibility 
of transcriptional and translational coupling. Table 1 
sxinanarises the nucleotide sequences and lengths of ORFs . 
1-10. 

When the predicted amino acid sequences of proteins 
10 encoded by ORFs 1 - lO were compared to protein sequence 
databases, some similarities were noted in addition to 
the already mentioned similarity between CIA and enzymes 
of arginine metabolism. ORF l showed a low 
level of similarity to penicillin binding proteins from 
15 several different microorganisms which are notable for 
their resistance to j5-lactam compounds. 

An EcoRI fragment of the 15 kb DNA sequence, 
containing 11.6 kb DNA, was cloned into a high copy 
number shuttle vector and introduced into S. Ijvidans , as 
20 described in Example 4. Of seventeen transf ormants 
examined, two were able to produce clavulanic acid, 
indicating that the 11.6 kb fragment contains all the 
necessary genetic information for clavulanic acid 
production. 

25 This 11.6 kb fragment encompasses ORF 2 to ORF 9 of 

the 15 kb DNA sequence. 

ORF 2 shows a high degree of similarity to 
acetohydroxyacid synthase (AHAS) enzymes from various 
sources. AHAS catalyses an essential step in the 

30 biosynthesis of branched chain amino acids. Since valine 
is a precursor of penicillin and cephamycin antibiotics, 
and valine production is often subject to feedback 
regulation, it is possible that a deregulated form of 
AHAS is produced to provide valine during the antibiotic 

35 production phase. Alternatively, an AHAS-like activity 

may be involved in clavulanic acid production. While the 
presently recognized intermediates in the clavulanic acid 
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F.XAMPLES 

Example 1 

R«o<->»T-ial s1irai n<^. vectors and growth conditions. 
5 Rf^«.ntoinvces mavuliaerus NRFL 3585, St.reptomyces 

■,»n.nn-i i nenisis NRRL 5741, St-reptomyces lipmanii 
NRRL 3584, SHreotomvces ariseus NKRL 3851, Nocardia 
1 ar-fcamdurans NRRL 3802 and Streptomyces cattleya NRRL 
3841 were provided by the Northern Regional Research 
10 Laboratories, Peoria, II. StreptoTnyce«^ antibioticus ATCC 
8663 and streptomvces fradiae ATCC 19609 were obtained 
from the American Type Culture Collection, RocXville, MD. 
fi^^^.ptomvces Vividans strains 1326 and TK24 were provided 
by D.A. Hopwood (John Innes Institute, Norwich, V.K.) , 
15 gi-r-«:.«tomvces yenezuelae_13s and Streptomyces qriseofuscus 
HRRL B-5429 were obtained from L.C. Vining (Department of 
Biology, Dalhousie University, Halifax, N.S.)- Cultures 
were maintained on either MYM (Stuttard (1982) J. Gen. 
Microbiol., V. 128, pp. 115-121) or on a modified R5 
20 medium (Hopwood et al. (1985) in "Genetic Manipulation of 
fii-r-.=.ptQmvces : a laboratory manual", John Innes 
Foundation, U.K.) containing maltose instead of glucose 
and lacking sucrose (R5-S) . Escherichia coU MV1193 
(Zoller and Smith (1987) Methods in Enzymology, v. 154, 
25 pp. 329-349) , used as recipient for all of the cloning 

and subcloning experiments, was grown in Luria Broth (LB; 
Sambrook et al. (1989) in "Molecular Cloning : a 
laboratory manual". Cold Spring Harbour, N.V.) or on LB 
agar (1.5%) plates containing ampicillin (50 ng/nlj) or 
30 tetracycline (10 ^g/ml.) . The cloning vectors pUCllS and 
PUC119 (Vieira and Messing (1987) Methods in Enzymology, 
V. 153, pp. 3-11) were provided by J. Vieira (Waksman 
Institute of Microbiology, Rutgers University, 
Piscataway, N,J.). The plasmid vector pJOE829 was 
35 generously provided by J. Altenbuchner (University of 
Stuttgart, Stuttgart, Germany) . The plasmid pIJ702 was 
obtained from the American Type Culture Collection, 
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The probe was designed as an 8-fold degenerate 
mixture of oligonucleotides to take into consideration 
the biased codon usage of streptomvces (Bibb et al., 
1984, Wright and Bibb (1992), Gene, v. 113, pp. 55-65).)- 
End-labelled probe was then used to screen a cosmid 
library of s. clavuliaerus genomic DNA fragments as 
described in Materials and Methods. 

A library of s. clavuliaerus genomic DNA fragments 
(15-22 kb size fractionated fragments) was constructed as 
) previously described (Doran et al. (1990), J. Bacterid., 
V. 172, pp. 4909-4918). using the cosmid vector pIAFR3. 
A collection of 1084 isolated E. coli colonies containing 
recombinant cosmids was screened for the presence of cla 
using the 24-mer mixed oligonucleotide probe (Fig. 1) 
5 which had been end-labelled with [-y-'^PldATP and 

polynucleotide kinase (Boehringer Mannheim) . Colony 
hybridization and subsequent washing was performed as 
described by Sambrook et al., (1989), at 55°C with a 
final wash in 0.2X SSC (IX SSC, 0.15M NaCl and 0.015M 
0 sodium citrate) and 0.1% SDS. 

Five colonies which gave strong hybridization 
signals were isolated from the panel of 1084 clones, and 
restriction analysis showed that the positive clones 
contained overlapping fragments of DNA. Two clones, K6L2 
25 and K8L2, with sequences that spanned about 40 kb of the 
s. clavuliaerus genome, were chosen for further analysis. 
Clone K8L2 contained about 22 kb of S. clavuligerus 
genomic DNA and included a portion of cla and all of the 
pcb C gene which encodes IPNS in the penicillin/cephamycin 
30 biosynthetic pathway. A restriction map of K6L2 is shown 
in Pig. 9. Within the approximately 27 kb of DNA 
contained in K6L2, the oligonucleotide probe hybridized 
to a 2.0 kb Nco l fragment which was subsequently found to 
contain the entire £la gene. Hybridization studies, 
35 restriction mapping and DNA sequence analysis revealed 

that cla was situated 5.67 kb downstream of the ESfeC gene 
of s. clavuliaerus (Fig. 9) . 



2108113. ^ , / . 

program described above. The AUH sequence had previously 
been aligned with the three ARG sequences (Szuinanski & 
Boyle (1990), J. Bacterid-, v. 172, pp. 538-547). 
Identical matches in two or more sequences are indicated 
5 with upper case letters. 



Example 2 

DNA hybridization 

Genomic DNA preparations from various streptomyces 
10 species were isolated as described by Hopwood et al. 

(1985). For interspecies DNA hybridization analysis, 2.0 
Mg amounts of genomic DNA preparations were digested with 
Ncol for 16h, and electrophoresed in 1.0% agarose gels. 
The separated DNA fragments were then transferred onto 
15 nylon membranes (Hybond--N, Amersham) and hybridized with 
a cla specific probe prepared by labeling an internal 459 
bp sai l fragment (Fig. 1) with [a-«P]dATP by nick 
translation. Hybridization was done as described by 
Sambrook et al., (1989). Hybridization membranes were 
washed twice for 30 min in 2X SSC; 0.1% SDS and once for 
30 min in O.IX SSC; 0.1% SDS at 65oc. 
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Se quences homologous to cla in oth er Streptomycetes 

Three of six producers of /8-lactam antibiotics, S^. 
25 Glavuiiaerus . S. lipmanii and S. iumon-i inensis showed 

positive hybridization signals whereas S. c^ttleya, S^. 

qriseus . and N, lactamdurans did not (data not shown) . 

None of the nonproducing strains examined, S. venezuelae, 

s. lividans . S. fradiae , S. antib ioticus and 
30 qriseofuscus gave any signal. All of the streptomycetes 

that gave positive signals were producers of clavam-type 

metabolites (Elson et al., 1987) 



Example 3 

35 Disruption of the genomic c la aene 

A 2.0 kb Ncol fragment that contained the entire cla 
gene was digested at its unique Kon l site and the ends 
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bioassay procedures described previously (Jensen et al. 

(1982) , supra ) . 

All of the resulting colonies with disrupted cla 
genes grew equally well on minimal medium and complex 
media and produced as much penicillin and cephamycin as 
did the wild-type, but produced no clavulanic acid (data 
not shown) . HPLC analysis of cell supernatants confirmed 
the inability of the disrupted cla mutants to synthesize 
any clavulanic acid (data not shown) . 

Example 4 

Protoplast forTTiation and tran sformation 

E> coll competent cell preparation and 
transformation were as described by Sambrook et al., 
(1989). Protoplasts of s. clavul iaerus were, prepared, 
transformed and regenerated as described by Bailey et al. 
(1984), Bio /Techno logy, v. 2, pp- 808-811, with the 
following modifications. Dextrin and arginine in the 
regeneration medium were replaced by starch and sodium 
glutamate respectively. Protoplasts were heat shocked at 
430c for 5 min prior to the addition of DNA. Standard 
procedures were used for protoplast ing and transformation 
of S, lividans (Hopwood et al. (1985)). 

The 11.6 kb EcoRl fragment from K6L2 (Fig. 9) was 
5 cloned into the EcoRl site of pCAT-119. pCAT^119 is 

derivative of pUC119 which was prepared by insertionally 
inactivating the ampicillin resistance gene of pUC119 by 
the insertion of a chloramphenicol acetyltransf erase gene 
(Jensen et al. (1989), Genetics & Molec. Biol, of Ind. 
0 Microorg., pp. 2 39-245 Ed. Hershberger, Amer. Soc. 

Microbiol). The PCAT-119 plasmid carrying the 11.6 kb 
fragment was then digested with PstI and ligated to the 
stre ptomvces plasmid pIJ702, which had also been digested 
with Pstl. The resulting bifunctional plasmid carrying 
5 the ll.Gkb insert was capable of replicating in either E^ 
coli (with selection for chloramphenicol resistance) or 
in lividans (with selection for thiostrepton 
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Example 5 

Sequencing of 15 kb DNA fragment 

Ordered sets of deletions were generated as 
described in Example 1 using fragments of the DNA insert 
from the cosmid clone K6L2 (Figure 9) and subcloned into 
the R. coli plasmids pUCllS andpUC119. Overlapping 
fragments were chosen which extended from the end of the 
ECbC gene downstream for a distance of about 15 kb ending 
at the Balll site. The deletion generated fragments were 
sequenced in both orientations as described in Example 1. 
The sequence is shown in Figure 2 . 



The present invention is not limited to the features 
of the embodiments described herein, but includes all 
variations and modifications within the scope of the 
claims. 
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The embodiments of the invention in which an 
exclusive property or privilege is claimed are defined as 
follows: 

5 1^ An isolated genomic DNA molecule comprising the 

nucleotide sequence of Figure 2 (Sequence ID No.:l)- 

2. An isolated DNA molecule having the nucleotide 
sequence of nucleotides 2033 to 13636 of Figure 2 

10 (Sequence ID No.: 20). 

3. An isolated DNA molecule having the nucleotide 
sequence of nucleotides 109 to 1764 of Figure 2 (Sequence 
ID No. : 21) . 

15 

4. An isolated DNA molecule having the nucleotide 
sequence of nucleotides 2216 to 3937 of Figure 2 
(Sequence ID No.: 22). 

20 5. An isolated DNA molecule having the nucleotide 

sequence of nucleotides 3940 to 5481 of Figure 2 
(Sequence ID No.:23) . 

6. An isolated DNA molecule having the nucleotide 
25 sequence of nucleotides 5654 to 6595 of Figure 2 

(Sequence ID No.: 24). 

7. An isolated DNA molecule having the nucleotide 
sequence of nucleotides 6611 to 7588 of Figure 2 

30 (Sequence ID No.:25). 

8^ An isolated DNA molecule having the nucleotide 

sequence of nucleotides 7895 to 9076 of Figure 2 
(Sequence ID No.:26). 

35 
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18. An isolated DNA molecule comprising a 

nucleotide sequence encoding the amino acid sequence of 
Figure 15. 

5 19. An isolated DNA molecule comprising a 

nucleotide sequence encoding the amino acid sequence of 
Figure 16. 

20. An isolated DNA molecule comprising a 

10 nucleotide sequence encoding the amino acid sequence of 
Figure 17 . 

21. An isolated DNA molecule comprising a 
nucleotide sequence encoding the amino acid sequence of 

15 Figure 18. 

22. An isolated DNA molecule comprising a 
nucleotide sequence encoding the amino acid sequence of 
Figure 19. 

20 

23. An isolated protein having the amino acid 
sequence of Figure 10. 

24. An isolated protein having the amino acid 
25 sequence of Figure 11. 

25. An isolated protein having the amino acid 
sequence of Figure 12. 

3Q 26. An isolated protein having the amino acid 

sequence of Figure 13 . 

27. An isolated protein having the amino acid 

sequence of Figure 14. 



35 



2a. An isolated protein having the amino acid 

sequence of Figure 15. 
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transforming the host with a DNA molecule comprising a 
nucleotide sequence encoding one or more of the enzymes 
of the clavulanate synthetic pathway. 
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FIGURE 2 - 




61 ggagggggcg gccggcccgt ccggtgcgcg cggtgggtgc ggcgcgggTC AGCCGGCCGC 120 
121 GAGGTTGCTG AGGAACTTCG CGGCGACGG6 GCCC6CGTCG GCGCCGCCCG ACCCGCCGTC 180 
181 CTCCAGCAGG ACCGACCAGG CGATGTTCCG GTCGCCCTGG TAGCCGATCA TCCAGGCGTG 210 
211 CGTCTTCGGC GGCTTCTCGG TGCCGAACTC GGCG6TACCG 6TCTT66CGT 6CGGCTGTCC 300 
301 GCCGA6GCCC CGCAGGGCGT C6CCG6CGCC 6TCGGTGACG GTCGAACGCA TCATGGAACG 360 
361 CA6CGAGTCG ACGAT6CCCG GGGCCATCCG GGGGGCCT66 TGC6GCTTCT TGACCGCGTC 120 
121 GGGCACCAGC ACGGGCTGCT TGAACTCGCC CTGCTTGACG 6TGGCGGC6A TGGAGGCCAT 180 
181 CACCAGGGGC GACGCCTCGA CCCTGGCCTG TCCGATGGTG GACGCGGCCT T6TCGTTCTC 610 
511 GCTGTTGGA6 ACG6GGAC6C TGCCGTCGAA GGTGGAG6C6 CCGACGTCCC AGGTGCCGCC 600 
601 GATGCCGAAG GCTTCGGCGG CCTGCTTCAG GCTGGACTCG GAGAGCTTGC TGCGGGA6TT 660 
661 GACGAAGAAC GTGTTGCAGG AGTGGGCGAA GCTGTCCCGG AA6GTC6AGC CCGCGGGCAG 720 
721 CGTGAACTGG TCCT6GTTCT CGAAGCTCTG 6CC6TTGACA T6GGCGAACT TCGGGCAGTC 780 
781 GGCCCGCTCC TCCSGGTTCA TCCCCTGCTG GAGCAGGGCC GCG6TGGTGA CCACCTT6AA 810 
811 GGTGGAGCCe GGCGGGTAGC GGCCCTCCAG CSCGCGGTTC ATGCCGGAG6 GCACGTTCGC 900 
901 GGCGGCCAGG ATGTTGCCGG TGGCGGGGTC GACGGCGACG ATCGCCGCGT TCTTCTTCGA 960 
961 GCCCTCCAGG GCCGCCGCG6 C6GCGGACTG GACCCGCGGG TCGATGGTGG TCTTCACCG6 1020 
1021 CTTGCCCTCG GTGTCCTTGA GGCCGGTGAC CTTCTTGACC ACCTGGCCGG ACTCACG6TC 1080 
1081 CAG6ATCACG ACCGACCGCG CCGCGCCGGA GCCGCCGGTG AGCTGCTTGT CGTAGCGGGA 1110 
1111 CTG6AGGCCC 6CC6AGCCCT TGCCGGTCCT GGGGTCGACC GCGCCGATGA TGGAGGCGGC 1200 
1201 CTGGAG6ACA TTGCCGTT6G CGTCGAGGAT GTCCGCGCGC TCCCGCGACT TGA6GGCGAG 1260 
1261 GGTCTGCCCC GGAACCATCT GCGGAT6GAT CATCTCGGTG TTGAACGCGA CCTTCCACTC 1320 
1321 CTTGCC6CCG CCGACGA.CCT TCGCGGT6GA GTCCCAGGCG TACTCCCCGfi CCCCGGGGAG 1380 
1381 GGTCATTCTG ACGGTGAACG GTATCTCCAC CTCGCCCTCG GGGTTCTTCT CCCC6GTCTT 1110 
1111 GGC6GT6ATC TCCGTCTTCG TCG6CTTGAG GTTGGTCATG ACGGATTT6A TCAGCGACTC 1600 
1501 GGCGTTGTCC GGGGTGTCCG TCAGCCCGGC GGCCGTCGGG GCGTCGCCCT TCTCCCA6GC 1560 
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FIGURE 2-3 

3241 CGT6GAGCAC TTCGAGACCG C6ACC6CCTC CTTC66GGCC AAGCAGC6CC ACGACATCGA 3300 

3301 GCCGCTGCGC GCCCGGATCG CGGAGTTCCT GGCCGACCCG GAGACCTAC6 AGGACGGCAT 3360 

3361 GCGCGTCCAC CAGGTCATCG ACTCCATGAA CACCGTCATG GAGGAGGCCG CCGAGCCCGG 3420 

3421 CGAGGGCACG ATCGTCTCCG ACATCGGCTT CTTCCGTCAC TACGGTGTGC TCTTCGCCCG 3480 

3481 CGCCGACCAG CCCTTC6GCT TCCTCACCTC GGCGGGCTGC TCCAGCTTCG GCTACGGCAT 3540 

3541 CCCCGCCGCC ATCGGCGCCC AGATG6CCCG CCCGGACCA6 CCGACCTTCC TCATCGC66G 3600 

3601 TGACGGCGGC TTCCACTCCA ACAGCTCCGA CCTGGAGACC ATCGCCCGGC TCAACCTGCC 3660 

3661 6ATCGTGACC GTC67CGTCA ACAACGACAC CAACGGCCTG ATCGAGCTGT ACCAGAACAT 3720 

3721 CG6TCACCAC CGCAGCCACG ACCCGGCGGT CAAGTTCG6C G6CGTCGACT TCGTCGCGCT 3780 

3781 CGCCGAGGCC AACGGTGTCG ACGCCACCCG CGCCACCAAC CGCGAG6AGC TGCTC6CGGC 3840 

3841 CCTGCGCAAG GGTGCCGAGC TGGGTCGTCC GTTCCTCATC GAGGTCCCGG TCAACTACGA .3900 

End of ORF 2--> Beginning of ORF 3--> 
3901 CTTCCAGCCG GGCGGCTTCG GCGCCCTGAG CATCTGAtcA TGGGGGCACC GGTTCTTCCG 3960 

3961 GCTGCCTTCG GGTTCCTGGC CTCCGCCC6A ACGG6CGG6G GCCGGGCCCC CGGCCCGGTC 4020 

4021 TTCGCGACCC GGGGCAGCCA CACCGACATC GACACGCCCC AGGGGGAGCG CTCGCTCGCG 4080 

4081 GCGACCCTGG TGCACGCCCC CTCGGTCGCG CCCGACCGCG CGGTGGCGCG CTCCCTCACC 4140 

4141 GGCGCGCCCA CCACCGCGGT GCTCGCCGGT GAGATCTACA ACCGGGACGA ACTCCTCTCC 4200 

4201 QTGCTGCCCG CCGGACCCGC GCCG6AGGGG GACGCGGAGC TGGTCCTGCG GCTGCTGGAA 4260 

4261 CGCTAT6ACC T6CATGCCTT CC6GCTGGTG AACGGGCGCT TCGCGACCGT GGT6CGGACC 4320 

4321 GGG6ACCGGG TCCTGCTCGC CACCGACCAC GCCGGTTCGG TGCCGCTGTA CACCTGTGTG 4380 

4381 GCGCCGGGCG AGGTCCGGGC GTCCACCGAG GCCAAGGCGC TCGCCGCGCA CCGCGACCCG 4440 

4441 AAGGGCTTCC CGCTCGCGGA CGCCCGCCGG GTCGCCGGTC TGACC6GTGT CTACCAGGTG 4600 

4501 CCC6CGGGCG CCGTGATGGA CATC6ACCTC GGCTCGGGCA CCGCCGTCAC CCACCGCACC 4560 

4561 TGGACCCCGG GCCTCTCCCG CCGCATCCTG CCGGAGGGCG AGGCCGTCGC GGCCGTGCGG 4620 

4621 GCCGCGCTGG AGAAGGCC6T CGCCCAGCGG GTCACCCCCG GCGACACCCC GTTGGT6GTG 4680 

4681 CTCTCCGGCG GAATCGACTC CTCCGGGGTC GCGGCCTGTG CGCACCGGGC GGCCGGGGAA 4740 

4741 CTGGACACGG TGTCCATGGG CACCGACACG TCCAACGAGT TCCGCGAGGC CCGGGCG6TC 4800 

4801 GTCGACCATC TGCGCACCCG GCACCGGGAG ATCACCATCC CGACCACCGA GCTGCTGGCG 4860 
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FIGURE 2-5 

End of ORF *»--> 

6511 GATCGGT6CG GAACTGCTCT ACCAGTACGC CCGAGCCCAC AGAACCCAGT TGTGAoggag 6600 
6601 ocatcgtgtc 5TGicCTCTC°{GSTAGf TM CTGCACCCCG TACCGCGACG AGCTGCTCGC 6660 
6661 GCTCGCCTCC GAGCTTCCCG AG6TGCCGCG CGCGGACCTC CATGGCTTCC TCGACGAGGC 6720 
6721 GAAGACGCTG GCCGCCCGTC TCCCGGAGGG 6CTGGCCGCC GCTCTCGACA CCTTCAACGC 6780 
6781 CGTGGGCAGC GAGGACGGTT ATCTGCTGCT GCGCGGGCTG CCCGTCGACG ACAGCGAGCT 6810 
6811 GCCCGAGAC6 CCGACCTCCA CCCCGGCCCC 6CTGGACCGC AAGCGGCTGG TGATGGAGGC 6900 
6901 CATGCTCGCG CTG6CC66CC GCC66CTCGG TCTGCACACG 6GGTACCAGG AGCTGCGCTC 6960 
6961 GG6CACGGTC TACCACGAC6 TGTACCCGTC GCCCGGCGCG CACTACCTGT CCTCGGAGAC 7020 
7021 CTCCGAGACG CTGCTGGAGT TCCACACGGA GATGGCGTAC CACATCCTCC AGCCGAACTA 7080 
7081 CGTCATGCTG GCCTGCTCCC 6CGC6GACCA CGAGAACC6G 6CGGA6ACGC T6GTCGGCTC 7110 
7111 GGTCCGCAAG GCGCTGCCCC TGCTG6ACGA GAAGACCCGG GCCCGTCTCT TCGACCGCAA 7200 
7201 GGTGCCCTGC TGCGTGGACG TGGCCTTCCG CGGCGGG6TC GACGACCCGG GCGCGATC6C 7260 
7261 CAAC6TCAAG CCGCTCTACG GGGACGCGAA CGACCCGTTC CTCGG6TACG ACCGCGAGCT 7320 
7321 6CTGGCGCCG GA6GACCCCG CGGACAAG6A GGCCGTCGCC CATCT6TCCC AGGCGCTCGA 7380 
7381 CGAT6TGACC GTCGGGGTGA AGCTCGTCCC CGGTGACGTC CTCATCATCG ACAACTTCCG 7110 
7111 CACCACGCAC GC6CGGACGC CGTTCTCGCC CCGCT6GGAC GGGAAGGACC GCTGGCTGCA 7500 

7S01 CC6CGTCTAC ATCCGCACCG ACCGCAAT6G ACAGCTCTCC GGCGGCGAGC GCGCGGGCGA 7560 

End of ORF 5— » ^ ^Ron 

7561 CACCATCTCG TTCTCGCCGC GCCGCTGAgc ccggctcccc gqggccctgg gccccggcgc /o^u 

7621 cggaoccggc tcccggtcct gccccctcac ccgccgcgcg ggtgaggggg caggcccctt 7680 

7681 tgtgccgggt gccgtgcgtc ctgcgogggt gccggggcgg gggggacggc ggaggtgccc 7710 

7711 ggcggccggg tgccgtgcgc cgcccgtggg tgctgtacag cactccgtgt gccgtgcgcc 7800 

7801 accccgtgco taaatttgcc actctotggg aaataotgca gagtgcgacg ggtgaggccg 7860 

Beginning of ORF o--* -iMn 
7861 tcgccgtgcc ctttccgtga caggagocgc tgacATGTCC GACA6CACAC CGAA6ACGCC 7920 

7921 CCGGGGATTC GTGGTGCACA C6GCGCCGGT GGGCCTGGCC GACGACGGCC GCCACGACTT 7980 

7981 CACCGTCCTC 6CCTCCACCG CCCCG6CCAC CGTGAGCGCC GTCTTCACCC GCTCCCGCTT 8010 

8011 CGCCGGGCCG AGCGTCGTGC T6T6CCGGGA GGCGGTGGCC GACGGGCAGG CGCGCGGT6T 8100 

8101 GGTGGTGCTG GCCCGCAACG CGAAT6TCGC 6ACCG6CCTG GAGG6CGAG6 AGAACGCGCG 8160 
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9811 TACCGGCTGC GGCCCGTGGC GACCGGCCCG TACCGGATCG TCTCGTACAC CCGGG6CGAG 9900 
9901 CTG6CCGTCC TGGAGCCCAA TCCGCACTGG 6ACCCCGAGA CCGACCCGGT GCGCGTCCAG 9960 
9961 CGCGCCTCCC GGATCGAGGT GCACCTC6GC AAGGACCCGC ACGAGGTGGA CCGCATGCTG 10020 
10021 CTGGCGGGCG AGGCCCATGT GGACCTCGCG GGCTTCGGTG TGCAGCCCGC GGCCCAGGAG 10080 
10081 CGCATCCTCG CCGAGCCGGA GCTGC6CGCG CACGCGGACA ACCCGCTGAC CGGCTTCACC 10110 
10111 T6GATCTACT GCCT6TCGAG CCGGATCGCC CCGTTCGACA ATGTGCACTG CCGGCGGGCC 10200 
10201 GTGCAGTTCG CCACCGACAA AGCGGCCATG CAGGAGGCGT AC6GCGGCGC 6GTGGGCGGC 10260 
10261 GACATCGCGA CCACCCTGCT GCCCCCGACC CTC6ACGGCT ACAAGCACTT CGACCGCTAC 10320 
10321 CCGGTCGGCC CCGAGGGCAC CGGCGACCTG GAG6CCGCCC GCGCCGA6CT GAAGCTGGCC 10380 
10381 GGGATGCCCG ACGGCTTCCG CACCAGGATC GCCGCCCGCA AGGACC6GCT CAAGGAGTAC 10110 
10111 CGGGCCGCCG A66CGCT6GC CGCCGGGCTC GCCC6GGTCG GCATCGAGGC GGAGGTGCTG 10500 
10601 GACTTCCCGT CGGGCGACTA CTTCGACCGC TACGGCGGCT GCCCGGAGTA TCTGCGCGAG 10560 
10561 CACGGGATCG GGATCATCAT GTTCGGCTGG GGCGCCGACT TCCCC6ACGG ATACGGCTTC 10620 
10621 C7CCAGCA6A 7CACCGACG6 GCGCGCGATC AAGGAGCGCG GCAACCAGAA CAT6GGCGA6 10680 
10681 CTGGACGACC CGGAGATCAA CGCGCTGCTG GACGAGGG66 CGCA6TGCGC CGACCCGGCG 10710 
10711 CGGCGCGCGG AGATCTGGCA CCGCATCGAC CAGCTCACGA TGGACCACGC 6GTCATCQTT 10800 

10801 CCGTATCTGT ACCCGCGGTC CCTGCTCTAC C6GCACCCG6 ACACCCGCAA CGCCTTCGTC 10860 

End of uKr irwion 
10861 ACCGGCTCCT TCGGGATGTA CGACTACGTG GCGCTCGGCG CGAAGTGAgc acggggtccg 

10921 gccccgggac egtatgtccc ggggccggac cccgcccgtt ccccgcccgg tccggtccgg 10980 

10981 acccggtcgc ggcccgcKi'SJclGA^JlC® CGGGCCCCGG CCGCGACCCC GCGCCGGATC 11010 

11011 66CCAGT6GC CCTGCGCCAG 6GGCCGTTCC ACGCTGCGGC AGGCGAGAGC GGCCTCGCGS 11100 

tItOI AACTCCGCCT CGTACAGCGC GAGCTGGCGC AGGAACTGCC GGGTCGGGCC GGTCAGGCTG 11160 

11161 GTCCCCCGCG 6GCTGCGCA6 CAGCAGCCGG GCGCCGAGGG ACTGCTCCAG CCGGTGAATC 11220 

11221 CGGCGGGTGA GCGCCGACTG GCTGATCGAC AGCACCGCCG CGGCCCGGTT GATGCTGCC6 11280 

11281 TGCCGGGCCA CGGCCTGGAG CAGATGGAGA TCGTCCACAT CCA6TTTGCG GCCCTCGGCC 11310 

11311 TGGCCGGGCA CGGAGCCCTG GTCQGGTCCC GCCCCGAAGC GGCGGGCGTC CGCGCCGGTG 11100 

11101 CGCTCCGCGT ACCACTGCGC CCACCAGGGC TCG7CCAGCA GGTCGCGGTG GTGTTC6GCG 11160 

11161 AAGCGCCGGA GCTGGACCTC GGCGATCAGC GCGGCCAGCC GTCCCGCCAG CGCCCGGGGC 11520 
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13201 CCCGGCGGCG GTCAGCTC6T CACCCAGGGC GC6CAGCTTC TCGACCCGGC GCGCGGC6AT 13260 

13261 GGCCACGGCG GCGCCCTCGG CGGCCA6GGC GCGGGCCGTG GCC7CGCCGA TGCCCGAGCT 13320 

< beginning of ORF 9 ,,oar, 

13321 CGCGCCCGTG ATGAGCGCGA CTTTCCCCTG GAGTGCGGAT GGCATcattt cctccacatg 13360 

13381 gtgctgcgat cgtggtgagc gtatgaagaa ggggtgagac ctgccgtgcc ggggcgggtt 13110 
13111 ccgtacgccg gaccgttgcg gtgggcacgg ccgaccgggt acggatggcc gcagttcccc 13600 
13501 ggggagticc cggggaatgg tgaataccgc ggcgctctcc gatggtcttc ggaggacacc 13560 
13561 cggggot-tcQ ccgggoatca gcggccggog ttctccccgt ccacggcago cgctotcagc 13620 
13621 gtcgcQt-tcc ccggtgaoU cccttcggtg gaccgggtta tgoctgtttc cgccgggtta 13680 
13681 tgcgcgccgc cccggcggoc cggccacccg ^ ^=^99999= Jj^Qcggcogatt gggcgccacg 13710 
13711 ocatggcgcg agcagcgotc ggcggtggAT^ ' gStGAACGAG GCAGCGCCTC AGTCCGACCA 13800 
13801 GGTGGCACCG GCGTATCCGA TGCACCGGGT CTGCCCGGTC GACCCGCCGC CGCAACTGGC 13860 
13861 CGGGCTGCGG TCCCAGAA6G CCGCGAGCCG GGTGACGCTG TGGGACGGCA 6CCAGGTGTG 13920 
13921 6CT6GTGACC TCGCACGCCG GGGCCCGGGC CGTCCT6GGC GACCGCCGCT TCACCGCGGT 13980 
13981 GACGAGCGCG CCCGGCTTCC CGATGCT6AC CCGCACCTCC CAACTGGTGC GCGCCAACCC 11010 
moil GGAGTCGGCG TC6TTCATCC GCATGGAC6A CCCGCAGCAC TCCCGGCTGC GCTCGATGCT 11100 
11101 CACCCGQGAC TTCCTGGCCC GCCGCGCCGA GGCGCTGCGC CCCGCGGTGC GGGAGCTGCT 11160 
11161 GGACGAGATC CTGGGCGGGC TGGTGAAGGG GGAGCGGCCG GTCGACCTG6 TCGCCGGACT 11220 
11221 GACGATCCCG 6TGCCCTCGC GGGTCATCAC CCTGCTCTTC GGCGCCGGTG ACGACCGCC6 11280 
11281 GGAGTTCATC GAGGACCGCA GCGCGGTCCT CATCGACCGC GGCTACACCC C6GAGCAGGT 11310 
11311 CGCCAAGGCC CGGGACGAAC TCGACGGCTA TCTGCGGGAG CTGGTCGAGG AGCGGATCGA 11100 
11101 GAACCCGGGC ACCGACCTGA TCAGCCG6CT CGTCATCGAC CAGGTGCGGC CGGGGCAICT 11160 
11161 GCGGGTCGAG GAGATGGTCC CGATGTGCCG GC7GCTGCTG GTGGCCGGTC ACGGCACCAC 11520 
11521 CACCAGCCAG GCGAGCCTGA GCCTGCTCA6 CCTGCTCACC GACCCGGAGC TGGCCGGGCG 11580 
11581 CCTCACCGAG GACCCG6CCC TGCTGCCCAA GGCGGTCGAG GAGCTGCTGC GCTTCCACTC 11610 
H611 CATCGTGCAG AACGGGCT66 CCCGTGCCGC GGTGGAGGAC GTCCAGCTCG ACGATGTGCT 11700 
11701 CATCCGGQCG GGCGAGg'gCG TGGTGCTGTC GCTGTCGGCG GGCAACCGGG ACGAGACGGT 11760 
11761 CTTCCCCGAC CCGGACCGGG TG6ACGTGGA CCGCGACGCC CGCCGCCATC TCGCCTTCGG 11820 
11821 CCACG6CATG CACCAGTGCC TGGGCCAGTG GCTGGCCCGG GTGGAGCTGG AGGAGATCCT 11880 
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1 MSRVSTWSG KPTAAHALLS RLRDHGVGKV 
61 GVAADVLARI TGRPQACWAT LGPGMINLST 
121 QSLDSVAIVA PMSLTfAVEUJ RPHEITOLVD 
181 TIVPNPPANT PAKPVGWAD GWQKAADQAA 
241 ERLNIPVITT YIAKGVLPVG HELNYGAVTG 
301 DLRPSMWQKG lEKKTVRISP TVNPIPRVYR 
361 lEPLRARIAE FLADPETiTED GMRVHQVIDS 
421 ARADQPPGFL TSAGCSSFGY GIPAAIGA04 
481 LPIVTVWNN CflNGLIELYQ NIGHHRSHDP 
541 AALRKGAELG RPFLIEVPVN YDFQPQGFGA 
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FGWGREAAS ILFDEVDPID FVLTRHEFTA 60 
GIATSVLDRS PVIALAAQSE SHDIFPNDIH 120 
SAVNAAMTEP VGPSFISLPV DU/SSSEGID 180 
ALLAEAKHPV LWGAAAIRS GAVPAIRAIA 240 
WDGILNFPA LQflMFAPVDL VLTVGYDWffi 300 
■pDVDWTDVL AFVEHFETAT ASFGAKQRHD 360 
MNTVMEEAAE PGEGTIVSDI GFFRHYGVLF 420 
ARPDQPTFLI AGDGGFHSNS SDLBTIARIH 480 
AVKFQGVDFV ALAERNGVDA TRATNREELL 540 
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1 MMNEAAPQSD QVAPAYPMHR VCPVDPPPQL 
61 AVLGDRRFTA VTSAPGFEML TOTSQLVRAN 
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